thermodynamic property
Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning
Liu, Juejing, Anderson, Haydn, Waxman, Noah I., Kovalev, Vsevolod, Fisher, Byron, Li, Elizabeth, Guo, Xiaofeng
New discoveries in c hemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerat ing research efficiency . Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters) . Our LLM - based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine - readable structure, including stability constants for metal cation - ligand interactions, thermodynamic properties, and other broader data types ( medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained using the CatBoost algorithm for accurately predict ing thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research . Keywords: Thermodynamics, Machine L earning, Large Language Model, D ata M ining, Database Introduction Chemi cal thermodynamics are fundamental for understanding chemical reactions, proposing novel methods to control these reactions, and pred icting chemical equilibria /reactions for new materials. Although scientific breakthroughs occur regularly, contributing to these advances becomes progressively more complex. T ypical research project necessitates a comprehensive literature review that should cover the current state of the field and identify knowledge gaps . Subsequently, rigorous experimentation and modeling are performed to fill such gaps or check hypothesis - driven predictions . Both these steps are essential research steps not unique in chemical research, which however, are inherently mentally - intensive and time - consuming .
Inverse Materials Design by Large Language Model-Assisted Generative Framework
Hao, Yun, Fan, Che, Ye, Beilin, Lu, Wenhao, Lu, Zhen, Zhao, Peilin, Gao, Zhifeng, Wu, Qingyao, Liu, Yanhui, Wen, Tongqi
These authors contributed equally: Y un Hao, Che Fan. Here, we introduce AlloyGAN, a closed-loop framework that integrates Large Language Model (LLM)-assisted text mining with Conditional Generative Adversarial Networks (CGANs) to enhance data diversity and improve inverse design. For metallic glasses, the framework predicts thermodynamic properties with discrepancies of less than 8% from experiments, demonstrating its robustness. By bridging generative AI with domain knowledge and validation workflows, AlloyGAN offers a scalable approach to accelerate the discovery of materials with tailored properties, paving the way for broader applications in materials science. Materials design typically involves two fundamental problems: forward and inverse problems. The forward problem focuses on understanding the relationship between composition, processing conditions, and material properties. This understanding enables researchers to optimize alloy compositions and processing conditions to achieve enhanced performance. Conversely, the inverse problem is more prevalent in material design and poses the question: "Given the desired material properties, what composition and processing conditions are required to achieve them?" The inverse problem is particularly challenging for multi-component materials due to the vast composition space and complex interactions among components. Traditional "trial-and-error" experimental approaches are often prohibitively time-consuming and cost-ineffective [1] for such problems. Addressing these challenges thus requires innovative approaches to efficiently navigate the composition space and identify optimal solutions for materials design.
On the Robustness of Machine Learning Models in Predicting Thermodynamic Properties: a Case of Searching for New Quasicrystal Approximants
Avilov, Fedor S., Eremin, Roman A., Budennyy, Semen A., Humonen, Innokentiy S.
Despite an artificial intelligence-assisted modeling of disordered crystals is a widely used and well-tried method of new materials design, the issues of its robustness, reliability, and stability are still not resolved and even not discussed enough. To highlight it, in this work we composed a series of nested intermetallic approximants of quasicrystals datasets and trained various machine learning models on them correspondingly. Our qualitative and, what is more important, quantitative assessment of the difference in the predictions clearly shows that different reasonable changes in the training sample can lead to the completely different set of the predicted potentially new materials. We also showed the advantage of pre-training and proposed a simple yet effective trick of sequential training to increase stability.
Perfecting Liquid-State Theories with Machine Intelligence
Recent years have seen a significant increase in the use of machine intelligence for predicting electronic structure, molecular force fields, and the physicochemical properties of various condensed systems. However, substantial challenges remain in developing a comprehensive framework capable of handling a wide range of atomic compositions and thermodynamic conditions. This perspective discusses potential future developments in liquid-state theories leveraging on recent advancements of functional machine learning. By harnessing the strengths of theoretical analysis and machine learning techniques including surrogate models, dimension reduction and uncertainty quantification, we envision that liquid-state theories will gain significant improvements in accuracy, scalability and computational efficiency, enabling their broader applications across diverse materials and chemical systems.
Efficient Chemical Space Exploration Using Active Learning Based on Marginalized Graph Kernel: an Application for Predicting the Thermodynamic Properties of Alkanes with Molecular Simulation
Xiang, Yan, Tang, Yu-Hang, Gong, Zheng, Liu, Hongyi, Wu, Liang, Lin, Guang, Sun, Huai
We introduce an explorative active learning (AL) algorithm based on Gaussian process regression and marginalized graph kernel (GPR-MGK) to explore chemical space with minimum cost. Using high-throughput molecular dynamics simulation to generate data and graph neural network (GNN) to predict, we constructed an active learning molecular simulation framework for thermodynamic property prediction. In specific, targeting 251,728 alkane molecules consisting of 4 to 19 carbon atoms and their liquid physical properties: densities, heat capacities, and vaporization enthalpies, we use the AL algorithm to select the most informative molecules to represent the chemical space. Validation of computational and experimental test sets shows that only 313 (0.124\% of the total) molecules were sufficient to train an accurate GNN model with $\rm R^2 > 0.99$ for computational test sets and $\rm R^2 > 0.94$ for experimental test sets. We highlight two advantages of the presented AL algorithm: compatibility with high-throughput data generation and reliable uncertainty quantification.